Encoding Distributional Semantics into Triple-Based Knowledge Ranking for Document Enrichment

نویسندگان

  • Muyu Zhang
  • Bing Qin
  • Mao Zheng
  • Graeme Hirst
  • Ting Liu
چکیده

Document enrichment focuses on retrieving relevant knowledge from external resources, which is essential because text is generally replete with gaps. Since conventional work primarily relies on special resources, we instead use triples of Subject, Predicate, Object as knowledge and incorporate distributional semantics to rank them. Our model first extracts these triples automatically from raw text and converts them into real-valued vectors based on the word semantics captured by Latent Dirichlet Allocation. We then represent these triples, together with the source document that is to be enriched, as a graph of triples, and adopt a global iterative algorithm to propagate relevance weight from source document to these triples so as to select the most relevant ones. Evaluated as a ranking problem, our model significantly outperforms multiple strong baselines. Moreover, we conduct a task-based evaluation by incorporating these triples as additional features into document classification and enhances the performance by 3.02%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Triple based Background Knowledge Ranking for Document Enrichment

Document enrichment is the task of retrieving additional knowledge from external resource over what is available through source document. This task is essential because of the phenomenon that text is generally replete with gaps and ellipses since authors assume a certain amount of background knowledge. The recovery of these gaps is intuitively useful for better understanding of document. Conven...

متن کامل

Toward a Deep Neural Approach for Knowledge-Based IR

This paper tackles the problem of the semantic gap between a document and a query within an ad-hoc information retrieval task. In this context, knowledge bases (KBs) have already been acknowledged as valuable means since they allow the representation of explicit relations between entities. However, they do not necessarily represent implicit relations that could be hidden in a corpora. This latt...

متن کامل

Distributional and Neural Models for Extracting Manipulation-Relevant Relations from Text Corpora

In this paper we present a novel approach based on neural network techniques to extract common sense knowledge from text corpora. We apply this approach to extract of common sense knowledge about everyday objects that can be used by intelligent machines, e.g. robots, to support planning of tasks that involve object manipulation. The knowledge we extract is constituted by relations that relate s...

متن کامل

Text-Based Ontology Enrichment Using Hierarchical Self-organizing Maps

The success of the Semantic Web research is dependent upon the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an...

متن کامل

Novel Ranking-Based Lexical Similarity Measure for Word Embedding

Distributional semantics models derive word space from linguistic items in context. Meaning is obtained by defining a distance measure between vectors corresponding to lexical entities. Such vectors present several problems. In this paper we provide a guideline for post process improvements to the baseline vectors. We focus on refining the similarity aspect, address imperfections of the model b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015